Personal Loan Campaign Modelling

Key analysis objectives of the Project

At AllLife bank a model has be built to help the marketing department to identify the potential customers who have a higher probability of purchasing the loan.

  1. To predict whether a liability customer will buy a personal loan or not.
  2. Which variables are most significant.
  3. Which segment of customers should be targeted more.

Data Context

  1. ID: Customer ID
  2. Age: Customer’s age in completed years
  3. Experience: Years of professional experience
  4. Income: Annual income of the customer (in thousand dollars)
  5. ZIP Code: Home Address ZIP code.
  6. Family: The Family size of the customer
  7. CCAvg: Average spending on credit cards per month (in thousand dollars)
  8. Education: Education Level. 1: Undergrad; 2: Graduate;3: Advanced/Professional
  9. Mortgage: Value of house mortgage if any. (in thousand dollars)
  10. Personal_Loan: Did this customer accept the personal loan offered in the last campaign?
  11. Securities_Account: Does the customer have securities account with the bank?
  12. CD_Account: Does the customer have a certificate of deposit (CD) account with the bank?
  13. Online: Do customers use internet banking facilities?
  14. CreditCard: Does the customer use a credit card issued by any other Bank (excluding All life Bank)?

Importing all the required packages for the project:

Read in the dataset using pandas

To assess the data set (Top 10)

Its always better to check the random rows instead of top rows.

The dataset looks consistent with the description provided in the Data Dictionary. Lots of data have values which need to be processed so the data can be effectively used for analysis

Check the shape of the provided data

There are 5000 rows/observations of 14 columns

Check datatype count

Get the complete info of the dataset

Describe all columns

Data Analysis:

  1. There are 5000 records in dataset and 14 columns
  2. There are no missing values for all the columns
  3. Age range is 23 to 67 and mean age is 45
  4. Experience range is -3 to 43. Need to check the data issues specific to negative experience values
  5. Income range is 8K to 224k. Mean value is 64K
  6. Most of the families in the data constitute small families

Data Analysis: Univariate Analysis

Most of the customers have age between 35 to 55 years

Highest number of customers have income between 40K to 100K

Experience range from -3 to 43 years. Need to identify the records having -ve values and process them. -ve experience is not a expected value and may be result of incorrect data extraction. Negative experience need to be changed to 0 experience.

The family size is evenly distributed. All the families in the data have less than or equal to 4 members.

Most of the customers spend between .5K to 2.5K using credit card

Highest number of customers are undergrad followed by Graduates and advanced/professionals.

Value of house Mortgage range from 0 to almost 600K. Most of the mortgages are less than 250K.

More than 4.5K customers have opted for personal loan.

More than 4.5K customers have securities account

More than 4.5K customers have certificate of deposit (CD) account with bank

2K customers (40%) don't have internet banking facility.

3.5 K customers have credit cards from the bank

Data Analysis: Bivariate Analysis

The pair plot between the values in dataset show the overall distribution against each of the columns. This give an overall view against the selected dataset values.

Customers <= 25 years and >=65 don't have personal loans

Customers <= 41 years of experience don't have personal loans

Higher the income higher the chances of taking the personal loan

Most of the Personal loans were taken by family have 2 or more members.

Spending on credit card and chances of personal loan are positively correlated

Graduate and professional are majority of personal loan consumers

More the mortgage, there are higher chances of consumer taking personal loan.

Customers having security account have higher chances of taking the personal loans.

Customers having CD account have very high chances of taking the personal loans.

There is not much difference between the chances of taking the personal loan for customers having internet banking facility or not.

There is not much difference between the chances of taking the personal loan for customers having credit card or not.

Insights based on EDA

Univariate Assessments:

  1. Most of the customers have age between 35 to 55 years.
  2. Highest number of customers have income between 40K to 100K
  3. Experience range from -3 to 43 years. Need to identify the records having -ve values and process them. -ve experience is not a expected value and may be result of incorrect data extraction. Negative experience need to be changed to 0 experience.
  4. The family size is evenly distributed. All the families in the data have less than or equal to 4 members.
  5. Most of the customers spend between .5K to 2.5K using credit card.
  6. Highest number of customers are undergrad followed by Graduates and advanced/professionals.
  7. Value of house Mortgage range from 0 to almost 600K. Most of the mortgages are less than 250K.
  8. More than 4.5K customers have opted for personal loan.
  9. More than 4.5K customers have securities account
  10. More than 4.5K customers have certificate of deposit (CD) account with bank
  11. 2K customers (40%) don't have internet banking facility.
  12. 3.5 K customers have credit cards from the bank

Bivariate assessment:

  1. Customers <= 25 years and >=65 don't have personal loans
  2. Customers <= 41 years of experience don't have personal loans
  3. Higher the income higher the chances of taking the personal loan
  4. Most of the Personal loans were taken by family have 2 or more members.
  5. Spending on credit card and chances of personal loan are positively correlated
  6. Graduate and professional are majority of personal loan consumers
  7. More the mortgage, there are higher chances of consumer taking personal loan.
  8. Customers having security account have higher chances of taking the personal loans.
  9. Customers having CD account have very high chances of taking the personal loans.
  10. There is not much difference between the chances of taking the personal loan for customers having internet banking facility or not.
  11. There is not much difference between the chances of taking the personal loan for customers having credit card or not.

Data Pre-processing

Find the number of records with experience less than 0 years

There are 52 records having experience less than 0 and need

52 records having less than 0 value was updated with 0.

Model building - Logistic Regression

Model performance evaluation

Model will be further improved by using the decision tree

Build Decision Tree Model

DecisionTreeClassifier function will be used using default 'gini' criteria to split.

Functions to calculate different metrics and confusion matrix

Checking model performance on training set

Checking model performance on test set

Although is good difference between the recall score of training and test. The model is overfiting

Visualizing the Decision Tree

Reducing over fitting

Using GridSearch for Hyperparameter tuning of our tree model

Checking performance on training set

Visualizing the Decision Tree

Cost Complexity Pruning

checking performance on test set

Visualizing the Decision Tree

Insights & Recommendations

  1. Analysis was done using different technique including Decision Tree Classifier.
  2. Model was built to predict the customer will take a personal loan or not.
  3. Visualized of different trees and their confusion matrix was done.
  4. Income, Credit card spending, Higher education and family size are critical decision factors
  5. Pruning was done to reduce overfiting models

According to the final decision tree model:

  1. Income is the most important factor to come up with the decision on personal loan reimbursement.
  2. If the customer has education level 2 or 3 they are more likely to take personal loan
  3. If the customer has bigger family size they are more likely to take personal loan
  4. Higher credit card spending may lead to personal loan, and such customers can be approached as probable candidates for personal loan.
  5. Customers less than 25 years and greater 65 don't take personal loans. Either the bank is not considering them for loan or they are not considering personal loans. Marketing team need to focus this segment
  6. Customers having 41 years of experience don't have personal loans. It may be possible that they are old enough that bank is not considering them for personal loans.
  7. Customers with mortgage should be considered as probable candidates for personal loan by marketing team